Quick Start

This guide will help you get started with the LW-BenchHub for kitchen manipulation tasks. The framework supports multiple robots, teleoperation data collection, trajectory replay, and reinforcement learning training.

Prerequisites

Before starting, ensure you have properly installed the lw_benchhub environment following the installation instructions. You should have:

NVIDIA Isaac Lab properly configured
All required dependencies installed via install.sh
GPU drivers and CUDA properly set up

Teleoperation Data Collection

LW-BenchHub provides a comprehensive teleoperation system for collecting demonstration data in simulation kitchen environments. You can control various robots using different input devices to perform manipulation tasks.

Quick Start (choose a configuration)

Run the teleoperation interface by specifying a config preset name (e.g., pandaomron or g1-hand):

python ./lw_benchhub/scripts/teleop/teleop_main.py --task_config g1-hand
# or
python ./lw_benchhub/scripts/teleop/teleop_main.py --task_config pandaomron

Tip: If you prefer using teleop.sh, edit the script to set task_config to a valid preset.

Configuration Overview (teleop_base.yml)

The teleoperation system is highly configurable. Here's the default configuration structure:

# =========================
# Basic Settings
# =========================
disable_fabric: false
num_envs: 1
device: cpu
sensitivity: 1.0
step_hz: 50

# =========================
# Robot Settings
# =========================
robot: PandaOmron-Rel
robot_scale: 1.0

# =========================
# Task & Scene Settings
# =========================
task: RobocasaBaseTask
scene_backend: robocasa
task_backend: robocasa
debug_assets: null
layout: robocasakitchen
sources:
  - objaverse
  - lightwheel
  - aigen_objs
object_projects: []
usd_simplify: false
seed: 0

# =========================
# Data Collection Settings
# =========================
record: false
dataset_file: ./datasets/dataset.hdf5
num_demos: 1
enable_debug_log: false
continue_teleop_after_success: false

# =========================
# Teleoperation Device Settings
# =========================
enable_pinocchio: false
teleop_device: keyboard
relative_control: false
first_person_view: false
enable_rendering_optimization: true
enable_multiple_viewports: false
action_delay_async: false
action_delay_time_s: 0
action_delay_type: fixed
action_buffer_size: 100

# =========================
# Video Saving Settings
# =========================
enable_cameras: false
save_video: false
video_save_dir: ./videos
video_fps: 30

# =========================
# Object Placement Retry Settings
# =========================
max_scene_retry: 4
max_object_placement_retry: 3

resample_objects_placement_on_reset: true
resample_robot_placement_on_reset: true

Customizing Your Configuration

You can create your own personal configuration in the configs directory (the config file name must be globally unique across the entire project). Simply add the following line at the top of your new config file:

_base_: teleop_base

This will inherit all settings from teleop_base.yaml, and you can freely modify any content you need. For example, you can override parameters such as robot, layout, record, num_demos, or teleop_device to customize the scene, robot, control device, and more based on your requirements.

If you want to inherit multiple configuration files, simply modify the inheritance syntax as follows:

_base_: [teleop_base, config_A, config_B, ...]

Variable override order follows the list: the later a config appears, the higher its priority.

Available Input Devices

keyboard: Default input method with intuitive key mappings
dualhandtracking_abs: Two-hand tracking with retargeting (recommended for G1-Hand)
vr-controller: Supports standard VR controllers, including PICO and QUEST controllers, allowing robot teleoperation with full key and directional control compatibility.
vr-hand: Integrate PICO/QUEST hand tracking capabilities to control the robot
so101leader: SO101 master-slave arm control
bi_so101leader: Dual-arm SO101 master-slave control

Key notes

For dualhandtracking_abs, control is performed using Vision Pro. For Vision Pro configuration and usage, please refer to the IsaacLab documentation: https://isaac-sim.github.io/IsaacLab/main/source/how-to/cloudxr_teleoperation.html#cloudxr-teleoperation
For vr-controller and vr-hand, we use Open-TeleVision to connect PICO/QUEST devices. For instructions on how to use Open-TeleVision, please see https://github.com/OpenTeleVision/TeleVision
Controller quick guide: Use the trigger to open/close the gripper, the joystick to move the base, and the grip button to toggle base movement mode (rotate in place or translate). By default, the right controller operates the robot. For certain robots (e.g., dual-arm or X7s waist), use the left controller's trigger or joystick for control.

Trajectory Replay

After collecting teleoperation data, you can replay and analyze the trajectories using two different replay modes. This is essential for validating data quality and understanding robot behavior.

State-Based Replay

State replay reproduces the exact state of all objects in the scene, including robot poses, object positions, and articulation states. This provides perfect reproduction of the recorded session.

python ./lw_benchhub/scripts/teleop/replay_demos.py --dataset_file "/path/to/your/dataset.hdf5" --enable_cameras

Use cases:

Analyzing manipulation strategies
Debugging data collection issues
Creating reference videos
Understanding object interactions

Action-Based Replay

Action replay tests the robot's ability to reproduce behaviors by replaying only the actions, allowing the simulation to determine the resulting states. This mode tests the robustness of the recorded behaviors.

Joint Target Replay

Replays the computed joint targets, allowing the robot's control system to track these targets:

python ./lw_benchhub/scripts/teleop/replay_action_demo.py \
    --dataset_file /path/to/your/dataset.hdf5 \
    --replay_mode joint_target \
    --enable_cameras

Features:

Tests joint-level control directly, bypassing inverse kinematics
Validates simulation environment consistency and stability
Evaluates precise reproduction of joint trajectories

Action Replay

Replays the high-level actions through the full control pipeline (e.g., inverse kinematics + PD control):

python ./lw_benchhub/scripts/teleop/replay_action_demo.py \
    --dataset_file /path/to/your/dataset.hdf5 \
    --replay_mode action
    --enable_cameras

Features:

End-to-end action validation
Tests complete control pipeline
Evaluates task completion success

Replay Analysis

Both replay modes provide valuable insights:

Trajectory smoothness: Assess motion quality
Task success rate: Validate demonstration effectiveness
Control stability: Identify potential control issues
Data quality: Ensure consistent robot behavior

Reinforcement Learning

LW-BenchHub provides a complete RL pipeline integrated with SKRL. An RL run is defined by a YAML preset under configs/rl/skrl/ (e.g., g1_liftobj.yaml) that selects a robot, task, and scene. The environment itself is composed from a robot RL variant and a task RL config registered in lw_benchhub_rl/__init__.py.

Training a Policy

Launch RL training with the default preset:

bash train.sh   # default preset uses LiftObj (state variant)

The script sets GPU variables internally. To use a different preset/variant, edit train.sh or run the Python entry directly:

python ./lw_benchhub/scripts/rl/train.py \
  --task_config lerobot_liftobj_state \
  --headless \

Advanced Training Options

Use your own preset under configs/rl/skrl/ (e.g., my_robot_my_task.yaml):

_base_: rl_base
task: MyTask
robot: MyRLRobot
layout: AnyLayout
rl: MyRLConfig
num_envs: 128
usd_simplify: true # USD simplify enable
enable_cameras: false

Then train:

python ./lw_benchhub/scripts/rl/train.py --task_config my_robot_my_task --headless

RL configuration parameters

Below is a brief guide to the main parameters used by the RL presets and the agent configuration.

Environment preset (rl_base.yml)

remote_protocal: ipc         # restful or ipc
for_rl: true
concatenate_terms: false
disable_fabric: false        # Disable fabric and use USD I/O operations
device: cuda:0

# Environment Configuration
num_envs: 10                 # Number of environments to simulate
layout: robocasakitchen      # Scene layout
sources:                     # Object source name
  - objaverse
  - lightwheel
  - aigen_objs
object_projects: []          # Object project name
scene_backend: robocasa      # Scene backend name
task_backend: robocasa       # task backedn name
debug_assets: null           # Debug assets name
robot: null                  # Robot cfg name
task: null                   # Task cfg name
rl: null                     # RL cfg name
variant: null                # RL variant name, define when register the RL config
usd_simplify: false          # USD simplify enable
enable_rendering_optimization: true    # Enable optimization

# Asset and Export Paths
export_base_path: null       # Export base path (replace with actual ASSET_PATH if defined in code)
robot_scale: 1.0             # Robot scale
first_person_view: false     # Enable first-person view

# Reinforcement Learning Settings
video: false                 # Record videos during training
video_length: 200            # Length of recorded video (in steps)
video_interval: 2000         # Interval between video recordings (in steps)
seed: null                   # Seed used for the environment
distributed: false           # Run training with multiple GPUs or nodes
max_iterations: null         # Max RL policy training iterations

# ML Framework Settings
ml_framework: "torch"        # The ML framework used for training
# Options: ["torch", "jax", "jax-numpy"]

algorithm: "PPO"             # The RL algorithm used for training
# Options: ["AMP", "PPO", "IPPO", "MAPPO"]

# Checkpoint
checkpoint: null

# Play Settings
use_pretrained_checkpoint: false
real_time: false
enable_cameras: false
check_success: true

Key notes

Effective fabric usage is computed as use_fabric = not disable_fabric in the code.
num_envs trades off throughput vs. GPU memory.
algorithm selects which SKRL config key is used internally.

Task preset (e.g., g1_liftobj_state.yaml)

_base_: rl_base
task: LiftObj                 # Task name registered in lw_benchhub_rl
robot: G1-RL                  # Robot RL variant (action heads / EE frames)
layout: robocasakitchen-1-8   # Kitchen scene layout
rl: G1LiftObjStateRL          # RL Config
num_envs: 1024                # Override of rl_base default
usd_simplify: true            # Override for faster loading
enable_cameras: false         # State variant does not require cameras (set true for Visual)

Key notes

The final environment is composed from the RL config registered in lw_benchhub_rl/__init__.py.
You can override any key from rl_base.yml in the task preset.

Agent configuration (SKRL) — example fields

models:
  separate: false             # Single shared backbone for policy/value (false) or separate nets (true)
  policy:
    class: GaussianMixin      # Stochastic Gaussian policy (continuous actions)
    clip_log_std: true        # Clamp log_std to [min_log_std, max_log_std]
    min_log_std: -20.0
    max_log_std: 2.0
    initial_log_std: 0.0
    network:
      - name: net
        input: STATES         # Input tensor spec (auto: STATES/ACTIONS)
        layers: [512, 256, 128, 64]
        activations: elu
    output: ACTIONS
  value:
    class: DeterministicMixin # Scalar value head
    network:
      - name: net
        input: STATES
        layers: [512, 256, 128, 64]
        activations: elu
    output: ONE

memory:
  class: RandomMemory         # Rollout buffer backend
  memory_size: -1             # Auto = agent.rollouts

agent:
  class: PPO
  rollouts: 24                # Steps per environment before update (episode fragments)
  learning_epochs: 8          # Optimization epochs per update
  mini_batches: 4             # Gradient batches per epoch
  discount_factor: 0.99       # Gamma
  lambda: 0.95                # GAE lambda
  learning_rate: 1.0e-04
  learning_rate_scheduler: KLAdaptiveLR
  learning_rate_scheduler_kwargs:
    kl_threshold: 0.01        # Target KL for LR adaptation
  state_preprocessor: RunningStandardScaler
  value_preprocessor: RunningStandardScaler
  random_timesteps: 0         # Warmup steps with random actions
  learning_starts: 0          # Start updates after N steps
  grad_norm_clip: 1.0
  ratio_clip: 0.2             # PPO epsilon clip
  value_clip: 0.2
  clip_predicted_values: true
  entropy_loss_scale: 0.001   # Entropy bonus
  value_loss_scale: 2.0
  kl_threshold: 0.0           # Additional KL early stop (if used)
  rewards_shaper_scale: 0.01  # Global reward scaling factor
  time_limit_bootstrap: false # Bootstrap value at time limit
  experiment:
    directory: "lift_obj"     # Log root under policy/skrl/logs/
    experiment_name: ""       # Optional suffix
    write_interval: auto
    checkpoint_interval: auto

trainer:
  class: SequentialTrainer
  timesteps: 72000            # Total environment steps (num_envs * steps)
  environment_info: log

Practical tips

Effective batch size per update roughly scales with rollouts * num_envs (and sequence length if recurrent).
Increase entropy_loss_scale to encourage exploration; lower it for exploitation.
Use learning_rate_scheduler with kl_threshold to stabilize training automatically.

Additional RL Configuration (`rl` field) - Example

@rl_on(task=task_A)
@rl_on(task=task_B)
@rl_on(embodiment=robot_A)
@rl_on(embodiment=robot_B)
class TaskRobotRL(LwRL):
  # Example RL configuration for reward and observation tailored to a series of tasks (task_A, task_B, ...)
  # and a series of robots (robot_A, robot_B, ...).
  def __init__(self):
    super().__init__()
    self.policy_observation_cfg = TaskRobotPolicyObsCfg()
    self.rewards_cfg = RewardsCfg()
    self.events_cfg = EventCfg()
  
  def setup_env_config(self, orchestrator):
    # Set specific reward/observation configuration, link and eef names for your RL task
    pass
  
  def modify_env_cfg(self, env_cfg):
    # Set special environment configuration variables, such as the end-effector (ee) frame cfg
    pass

GPU Configuration

The training script automatically handles GPU allocation:

Single GPU: Uses one GPU for both environment and policy
Multi-GPU: Distributes environment and policy across different GPUs
CPU fallback: Supports CPU-only training for development

Policy Evaluation

Test your trained policies using the evaluation script:

bash eval.sh   # uses task_config=lerobot_liftobj_state_play by default

Or run custom evaluation:

python ./lw_benchhub/scripts/rl/play.py \
  --task_config lerobot_liftobj_state_play \
  # --enable_camera \
  # --headless

Training Monitoring

Tensorboard: Monitor training progress and metrics
Video logging: Automatic recording of policy performance
Checkpoint saving: Regular model checkpointing for recovery
Evaluation episodes: Periodic policy evaluation during training

Next Steps

Once you're familiar with the basic workflow, explore these advanced features:

Custom Tasks: Create new manipulation tasks by extending the base task classes
Robot Integration: Add support for new robot configurations
Multi-Modal Learning: Combine teleoperation data with RL training
Scene Customization: Design custom kitchen layouts and scenarios
Algorithm Integration: Implement new RL algorithms using the SKRL framework

For detailed information about each component, refer to the comprehensive documentation and API references.

Troubleshooting

Common Issues

GPU Memory: Reduce num_envs if encountering memory issues
Control Instability: Adjust sensitivity and step_hz parameters
Rendering Issues: Check Isaac Lab installation and graphics drivers
Data Collection: Ensure proper record: true and valid dataset_file path

Performance Optimization

Use --headless for faster training without visualization
Enable --enable_camera only when needed for vision-based policies
Optimize step_hz based on your hardware capabilities
Use appropriate device setting (cuda/cpu) for your setup

This framework provides a solid foundation for kitchen manipulation research, supporting everything from basic teleoperation to advanced reinforcement learning with multiple robot platforms.

Prerequisites​

Teleoperation Data Collection​

Quick Start (choose a configuration)​

Configuration Overview (teleop_base.yml)​

Customizing Your Configuration​

Available Input Devices​

Trajectory Replay​

State-Based Replay​

Action-Based Replay​

Joint Target Replay​

Action Replay​

Replay Analysis​

Reinforcement Learning​

Training a Policy​

Advanced Training Options​

RL configuration parameters​

Environment preset (rl_base.yml)​

Task preset (e.g., g1_liftobj_state.yaml)​

Agent configuration (SKRL) — example fields​

Additional RL Configuration (rl field) - Example​

GPU Configuration​

Policy Evaluation​

Training Monitoring​

Next Steps​

Troubleshooting​

Common Issues​

Performance Optimization​

Prerequisites

Teleoperation Data Collection

Quick Start (choose a configuration)

Configuration Overview (teleop_base.yml)

Customizing Your Configuration

Available Input Devices

Trajectory Replay

State-Based Replay

Action-Based Replay

Joint Target Replay

Action Replay

Replay Analysis

Reinforcement Learning

Training a Policy

Advanced Training Options

RL configuration parameters

Environment preset (rl_base.yml)

Task preset (e.g., g1_liftobj_state.yaml)

Agent configuration (SKRL) — example fields

Additional RL Configuration (`rl` field) - Example

GPU Configuration

Policy Evaluation

Training Monitoring

Next Steps

Troubleshooting

Common Issues

Performance Optimization